Skip to content

Conversation

@idegtiarenko
Copy link
Contributor

@idegtiarenko idegtiarenko commented Feb 4, 2025

Update code generation documentation with Aggs method signatures expected to be seen/used by the generated code

@idegtiarenko idegtiarenko added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.1.0 labels Feb 4, 2025
@idegtiarenko idegtiarenko requested a review from nik9000 February 4, 2025 10:26
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some remarks that could be worth adding. Maybe.
It looks good for now, thanks for improving this 🙏

* when both input type I and mutable accumulator state SS and GS are primitive (DOUBLE, INT).
* </li>
* <li>TBD explain {@code IntermediateState}</li>
* <li>TBD explain special internal state `seen`</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the warnings feature, there's also the "failed" state. Identical to seen. Never used in main though, only here: https://github.com/elastic/elasticsearch/pull/116170/files#diff-8a408014887a6dc87eed1f71346536fc77636245d5411714b6ba2cf265812538R18

Maybe worth mentioning, with the warnExceptions attribute

* Grouping aggregation expects:
* <ul>
* <li>type GS (a mutable state used to accumulate result of the grouping aggregation) to be public, not inner and implements {@link org.elasticsearch.compute.aggregation.GroupingAggregatorState}</li>
* <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean:

Suggested change
* <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li>
* <li>type T (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li>

From the following comments.

@alex-spies alex-spies self-requested a review February 5, 2025 14:09
@idegtiarenko idegtiarenko marked this pull request as ready for review February 6, 2025 14:44
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks Ievgen!

My only observation is that we're adding specifics about how to implement the Aggregator and GroupingAggregator annotations into the package info - I think they should rather stay with the corresponding annotations. But that was already true before this PR to some extent, just that now we add even more details.

Comment on lines 119 to 136
* <h4>Before you start implementing it, please note that:</h4>
* <ul>
* <li>All methods must be public static</li>
* <li>
* init/initSingle/initGrouping could have optional BigArrays, DriverContext arguments that are going to be injected automatically.
* It is also possible to declare any number of arbitrary arguments that must be provided via generated Supplier.
* </li>
* <li>
* combine, combineStates, combineIntermediate, evaluateFinal methods (see below) could be omitted and generated automatically
* when both input type I and mutable accumulator state SS and GS are primitive (DOUBLE, INT).
* </li>
* <li>
* Code generation expects at least one IntermediateState field that is going to be used to keep
* the serialized state of the aggregation (eg AggregatorState and GroupingAggregatorState).
* It must be defined even if you rely on autogenerated implementation for the primitive types.
* </li>
* </ul>
* <h4>Aggregation expects:</h4>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text here and below is very detailed and makes mentions of specific methods, parameters etc. This seems to have overlap with the javadoc of the org.elasticsearch.compute.ann.Aggregator which IMHO increases the chance of this javadoc getting quite stale, esp. if we have to make changes to the aggregator annotations.

I think it's better to say take a look at the annotation's description to see what methods they require you to implement + update the org.elasticsearch.compute.ann.Aggregator javadoc with the details provided here if the annotation's javadoc was insufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was treating it as an entry point into aggregation function manual.
Currently it contains some sections (re optional methods, parameters and public static requirement) that are common for both @Aggregator and @GroupingAggregator. For this reason I would like to keep it here for now.
I am separately looking into if it is worth merging above annotations. If that is possible this for sure should become a java doc.

* <h4>Aggregation expects:</h4>
* <ul>
* <li>
* type SS (a mutable state used to accumulate result of the aggregation) to be public, not inner and implements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: IMHO it's worth writing out the usual type names. I think this is normally SingleState. (Also, the abbreviation SS can have unfortunate connotations.)

* <ul>
* <li>All methods must be public static</li>
* <li>
* init/initSingle/initGrouping could have optional BigArrays, DriverContext arguments that are going to be injected automatically.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: things like BigArrays and DriverContext could be linked to via {@link ...}. (Applies to this javadoc in general.)

* It is also possible to declare any number of arbitrary arguments that must be provided via generated Supplier.
* </li>
* <li>
* combine, combineStates, combineIntermediate, evaluateFinal methods (see below) could be omitted and generated automatically
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: I think we usually format mentions of methods with {@code ...}. (Also applies to this javadoc in general.)

* <h4>Grouping aggregation expects:</h4>
* <ul>
* <li>
* type GS (a mutable state used to accumulate result of the grouping aggregation) to be public, not inner and implements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: GroupingState

@idegtiarenko idegtiarenko merged commit 58277e7 into elastic:main Feb 11, 2025
17 checks passed
@idegtiarenko idegtiarenko deleted the document_agg_code_generation branch February 11, 2025 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants